523 research outputs found

    Accurate estimation of homologue-specific DNA concentration-ratios in cancer samples allows long-range haplotyping

    Get PDF
    Interpretation of allelic copy measurements at polymorphic markers in cancer samples presents distinctive challenges and opportunities. Due to frequent gross chromosomal alterations occurring in cancer (aneuploidy), many genomic regions are present at homologous-allele imbalance. Within such regions, the unequal contribution of alleles at heterozygous markers allows for direct phasing of the haplotype derived from each individual parent. In addition, genome-wide estimates of homologue specific copy- ratios (HSCRs) are important for interpretation of the cancer genome in terms of fixed integral copy-numbers. We describe HAPSEG, a probabilistic method to interpret bi- allelic marker data in cancer samples. HAPSEG operates by partitioning the genome into segments of distinct copy number and modeling the four distinct genotypes in each segment. We describe general methods for fitting these models to data which are suit- able for both SNP microarrays and massively parallel sequencing data. In addition, we demonstrate a specially tailored error-model for interpretation of systematic variations arising in microarray platforms. The ability to directly determine haplotypes from cancer samples represents an opportunity to expand reference panels of phased chromosomes, which may have general interest in various population genetic applications. In addition, this property may be exploited to interrogate the relationship between germline risk and cancer phenotype with greater sensitivity than is possible using unphased genotype. Finally, we exploit the statistical dependency of phased genotypes to enable the fitting of more elaborate sample-level error-model parameters, allowing more accurate estimation of HSCRs in cancer samples

    Finding Motifs in Promoter Regions

    Full text link

    Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing

    Get PDF
    Retrotransposons constitute a major source of genetic variation, and somatic retrotransposon insertions have been reported in cancer. Here, we applied TranspoSeq, a computational framework that identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part of The Cancer Genome Atlas (TCGA) Pan-Cancer Project. In addition to novel germline polymorphisms, we find 810 somatic retrotransposon insertions primarily in lung squamous, head and neck, colorectal, and endometrial carcinomas. Many somatic retrotransposon insertions occur in known cancer genes. We find that high somatic retrotransposition rates in tumors are associated with high rates of genomic rearrangement and somatic mutation. Finally, we developed TranspoSeq-Exome to interrogate an additional 767 tumor samples with hybrid-capture exome data and discovered 35 novel somatic retrotransposon insertions into exonic regions, including an insertion into an exon of the PTEN tumor suppressor gene. The results of this large-scale, comprehensive analysis of retrotransposon movement across tumor types suggest that somatic retrotransposon insertions may represent an important class of structural variation in cancer.National Cancer Institute (U.S.) (grant U24CA143867)National Cancer Institute (U.S.) (grant U24CA126546

    Comprehensive assessment of cancer missense mutation clustering in protein structures

    Get PDF
    Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg 2+ , MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations. Keywords: cancer; cancer genetics; mutation clustering; protein structures; interaction interfacesNational Institutes of Health (U.S.) (Grant U24 CA143845

    Gene expression analysis reveals a strong signature of an interferon induced pathway in childhood lymphoblastic leukemia as well as in breast and ovarian cancer

    Full text link
    On the basis of epidemiological studies, infection was suggested to play a role in the etiology of human cancer. While for some cancers such a role was indeed demonstrated, there is no direct biological support for the role of viral pathogens in the pathogenesis of childhood leukemia. Using a novel bioinformatic tool, that alternates between clustering and standard statistical methods of analysis, we performed a "double blind" search of published gene expression data of subjects with different childhood ALL subtypes, looking for unanticipated partitions of patients, induced by unexpected groups of genes with correlated expression. We discovered a group of about thirty genes, related to the interferon response pathway, whose expression levels divide the ALL samples into two subgroups; high in 50, low in 285 patients. Leukemic subclasses prevalent in early childhood (the age most susceptible to infection) are over-represented in the high expression subgroup. Similar partitions, induced by the same genes, were found also in breast and ovarian cancer but not in lung cancer, prostate cancer and lymphoma. About 40% of breast cancer samples expressed the "interferon- related" signature. It is of interested that several studies demonstrated MMTV-like sequences in about 40% of breast cancer samples. Our discovery of an unanticipated strong signature of an interferon induced pathway provides molecular support for a role for either inflammation or viral infection in the pathogenesis of childhood leukemia as well as breast and ovarian cancer
    corecore